I’ll begin with a quote from Google’s Guidelines on Cloaking:
Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.
There areΒ two criticalΒ pieces in that sentence – “may” and “user agent.”Β Now, it’s true that if you cloak in the wrong ways, with the wrong intent, Google (and the other search engines) “may” remove you from their index, and if you do it egregiously, they certainly will. But, in many cases, it’s the right thing to do, both from a user experience perspective and from an engine’s.
To start, let me list aΒ number ofΒ web properties that currently cloak without penalty or retribution.
- Google – Search for “google toolbar” or “google translate” or “adwords” or any number of Google properties and note how the URL you see in the search results and the one you land on almost never match. What’s more, on many of these pages, whether you’re logged in or not, you might see some different content to what’s in the cache.
- NYTimes.com – The interstitial ads, the request to login/create an account after 5 clicks, andΒ the archive inclusion are all showing different content to engines vs. humans.
- Forbes.com – Even the home page can’t be reached without first viewing a full page interstitial ad, andΒ comparing Google’s “cached text” of most pages to theΒ componentsΒ that humans see is vastly different.
- Wine.com – In addition to some redirection based on your path, there’s the state overlay forcing you to select a shipping location prior to seeing any prices (or any pages). That’s a form the engines don’t have to fill out.
- WebmasterWorld.com – Pioneers of the now permissible and tolerated “first click free,” Googlebot (and only GGbot from the right set of IP addresses) is allowed access to thousands of clicks without any registration.
- Yelp.com – Geotargeting through cookies based on location; a very, very popular form of local targeting that hundreds, if not thousands of sites use.
- Amazon.com – In addition to the cloaking issues that were brought up on the product pages at SMX Advanced, Amazon does lots of fun things with their buybox.amazon.com subdomain and with the navigation paths & suggested products if your browser accepts cookies.
- iPerceptions.com – The site itself doesn’t cloak, but their pop-up overlay is only seen by cookied humans, and appears on hundreds of sites (not to mention it’s a project of one of Google’s staffers).
- InformationWeek.com – If you surf as Googlebot, you’ll get a much more streamlined, less ad-intensive, interstitial free browsing experience.
- ComputerWorld.com – Interstitials, pop-ups, and even some strange javascript await the non-bot surfers.
- ATT.com – Everyone who hits the URL gets a unique landing page with different links and content.
- Salon.com – No need for an ad sponsored “site pass” if you’re Googlebot π
- CareerBuilder.comΒ – The URLs you and I see are entirely different than the ones the bots get.
- CNet.com – You can’t even reach the homepage as a human without seeing the latest digital camera ad overlay.
- Scribd.com –Β The documents we see look pretty different (in format and accessibility) than the HTML text that’s there for the search engines.
- Trulia.com – As was just documented this past week, they’re doing some interesting re-directs on partner pages and their own site.
- Nike.com – The 1.5 million URLs you see in Google’s index don’t actually exist if you’ve got Flash enabled.
- Wall Street Journal – Simply switching your user-agent to Googlebot gets you past all those pesky “pay to access” breaks after the first paragraph of the article.Β
This list could go on for hundreds more results, but the message should be clear. Cloaking isn’t always evil, it won’t always get you banned, and you can do some pretty smart things for it, so long as you’re either:
A) A big brand that Google’s not going to get angry with for more than a day or two if you step over the line, OR
B) Doing the cloaking in a completely white hat way with aΒ positive intent for users and engines.
Here’sΒ a visual interpretation of my personal cloaking scale:
Let’s run through some examples of each:
Pearly White – On SEOmoz, we have PRO content like our Q+A pages, link directory, PRO Guides, etc. These are available only to PRO members, so we show a snippet to search engines and non-PRO members, and the full version to folks who are logged into a PRO account. Technically, it’s showing search engines and some users different things, but it’s based on the cookie and it’s done in exactly the type of way engines would want. Conceptually, we could participate in Google News’s first-click free program and get all of that content into the engine, but haven’t done so to date.
Near White – Craigslist.org does some automatic geo-targeting to help determine where a visitor is coming from and what city’s page they’d want to see. Google reps have said publicly that they’re OK with this so long as Craigslist treats search engine bots the same way. But, of course, they don’t. Bots get redirected to a page that I can only see in Google’s cache (or if I switch my user agent). It makes sense, though – the engines shouldn’t be dropped onto a geo-targeted page; they should be treated like a user coming from everywhere (or nowhere, depending on your philosophical interpretationΒ of Zen and the art of IP geo-location). Despite going against a guideline, it’s so extremely close to white hat, particularly from an intention and functionality point-of-view, that there’s almost no risk of problems.
Light Gray – I don’t particularly want to “out” anyone who’s doing this now, so let me instead offer an example of when and where light gray would happen (if you’re really diligent, you can see a couple of the sites above engaging in this type of behavior). Imagine you’ve got a site with lots of paginated articles on it. The articles are long – thousands of words, and even from a user experience point-of-view, the breakup of the pages is valuable. But, each page is getting linked to separately, there’s a “view on one page” URL, a “print version” URL, and an “email a friend” URL that are all getting indexed. Often, when an article’s interesting, folks will pick it up on services like Reddit and link to the print-only version, or to an interior page of the article in the paginated version. The engines are dealing with duplicate content out the wazoo, so the site detects for engines and 301s all the different versions of the article back to the original, view on one page source, but drops visitors who click that SERP to the article homepage in the paginated version.
Once again, the site is technically violating guidelines (and a little more so than in the near-white example), but it’s still well-intentioned, and it really, really helps engines like MSN & Ask.com, who don’t do a terrific job with duplicate content detection and canonicalization (and, to be fair, even Yahoo! and Google get stuck on this quite a bit). So – good intentions + positive user experience that meets expectations + use of a proclaimed shady tactic = light gray. Most of your big brand sites can get away with this ad infinitum.
Dark Gray – Again, I’ll give a hypothetical rather than call someone out. There are many folks who participate in affiliate programs, and the vast majority of these send their links through a redirect in Javascript, both to capture the click for their tracking purposers and to stop link juice from passing. Some savvier site owners have realized how valuable that affiliate link juice can be and have set up their own affiliate systems that do pass link juice, often by collecting links to unique pages, then 301’ing those for bots, passing the benefit of the links on to pages on their domain where they need external links to rank. The more crafty ones even sell or divide a share of this link juice to their partners or the highest bidder. This doesn’t necessarily affect visitors who come seeking what the affiliate’s linked to, but it can create some artificial ranking boosts, as the engines don’t want to count affiliate links in the first place, and certainly don’t want them helping pages they never intended toΒ receive theirΒ traffic.Β
Solid Black – Since I found some pure spam that does this, I thought I’d share. I recently performed a search at Google for inurl:sitemap.xml, hoping to get an estimate of how many sites use sitemaps.Β In the 9th position, IΒ found the odd URL – www.acta-endo.ro/new/viagra/sitemap.xml.html, which redirects humans to a page on pharmaceuticals. Any time a search result misleadingly takes you to content it not only doesn’t show the engine, but isn’t relevant to your search query, I consider it solid black.
Now for a bit of honesty – we’ve recommended pearly white,Β near white, and yes, even light gray to our clients in the past and we’ll continue to do so in the future when and whereΒ it makes sense. Search engine reps may decry it publicly, but the engines all permit some forms of cloaking (usually at least up to light gray) and even encourage it from brands/sites where it provides a better, more accessible experience.
The lesson here is don’t be scared off a tactic just because you hear it might be black hat or gray hat. Do your own research, form your own opinions, test on non-client sites, and do what makes the most sense for your business and your client. The only thing we have to fear is fear itself (and overzealous banning, but that’s pretty rare). π
p.s. The takeaway from this post should not be “cloak your site.” I’m merely suggesting that inflexible, pure black-and-white positions on cloaking deserve potential re-thinking.